Histopathology Research Template 🔬
Describe Materials and Methods as highlighted in (Knijn, Simmer, and Nagtegaal 2015).2
Describe patient characteristics, and inclusion and exclusion criteria
Describe treatment details
Describe the type of material used
Specify how expression of the biomarker was assessed
Describe the number of independent (blinded) scorers and how they scored
State the method of case selection, study design, origin of the cases, and time frame
Describe the end of the follow-up period and median follow-up time
Define all clinical endpoints examined
Specify all applied statistical methods
Describe how interactions with other clinical/pathological factors were analyzed
Codes for general settings.3
Setup global chunk settings4
knitr::opts_chunk$set(
eval = TRUE,
echo = TRUE,
fig.path = here::here("figs/"),
message = FALSE,
warning = FALSE,
error = FALSE,
cache = FALSE,
comment = NA,
tidy = TRUE,
fig.width = 6,
fig.height = 4
)Load Library
see R/loadLibrary.R for the libraries loaded.
Codes for generating fake data.5
Generate Fake Data
This code generates a fake histopathological data. Some sources for fake data generation here6 , here7 , here8 , here9 , here10 , here11 , here12 , here13 , and here14 .
Use this code to generate fake clinicopathologic data
Codes for importing data.15
Read the data
library(readxl)
mydata <- readxl::read_excel(here::here("data", "mydata.xlsx"))
# View(mydata) # Use to view data after importingAdd code for import multiple data purrr reduce
Codes for reporting general features.16
Dataframe Report
The data contains 250 observations of the following variables:
- ID: 250 entries: 001, n = 1; 002, n = 1; 003, n = 1 and 247 others
- Name: 249 entries: Aansh, n = 1; Abdurahmon, n = 1; Abrah, n = 1 and 246 others (1 missing)
- Sex: 2 entries: Male, n = 126; Female, n = 123 (1 missing)
- Age: Mean = 50.39, SD = 14.06, range = [25, 73], 1 missing
- Race: 7 entries: White, n = 162; Hispanic, n = 39; Black, n = 26 and 4 others (1 missing)
- PreinvasiveComponent: 2 entries: Absent, n = 186; Present, n = 63 (1 missing)
- LVI: 2 entries: Absent, n = 157; Present, n = 92 (1 missing)
- PNI: 2 entries: Absent, n = 171; Present, n = 78 (1 missing)
- Death: 2 levels: FALSE (n = 73); TRUE (n = 176) and missing (n = 1)
- Group: 2 entries: Control, n = 131; Treatment, n = 118 (1 missing)
- Grade: 3 entries: 3, n = 101; 1, n = 83; 2, n = 65 (1 missing)
- TStage: 4 entries: 4, n = 101; 3, n = 66; 2, n = 50 and 1 other (1 missing)
- Anti-X-intensity: Mean = 2.43, SD = 0.64, range = [1, 3], 1 missing
- Anti-Y-intensity: Mean = 2.06, SD = 0.78, range = [1, 3], 1 missing
- LymphNodeMetastasis: 2 entries: Absent, n = 153; Present, n = 96 (1 missing)
- Valid: 2 levels: FALSE (n = 137); TRUE (n = 112) and missing (n = 1)
- Smoker: 2 levels: FALSE (n = 125); TRUE (n = 124) and missing (n = 1)
- Grade_Level: 3 entries: high, n = 100; low, n = 77; moderate, n = 72 (1 missing)
- DeathTime: 2 entries: Within1Year, n = 149; MoreThan1Year, n = 101
250 observations with 21 variables
19 variables containing missings (NA)
0 variables with no variance
Codes for defining variable types.19
print column names as vector
c("ID", "Name", "Sex", "Age", "Race", "PreinvasiveComponent",
"LVI", "PNI", "LastFollowUpDate", "Death", "Group", "Grade",
"TStage", "Anti-X-intensity", "Anti-Y-intensity", "LymphNodeMetastasis",
"Valid", "Smoker", "Grade_Level", "SurgeryDate", "DeathTime")
See the code as function in R/find_key.R.
keycolumns <- mydata %>% sapply(., FUN = dataMaid::isKey) %>% as_tibble() %>% select(which(.[1,
] == TRUE)) %>% names()
keycolumns[1] "ID" "Name"
Get variable types
# A tibble: 4 x 4
type cnt pcnt col_name
<chr> <int> <dbl> <list>
1 character 11 57.9 <chr [11]>
2 logical 3 15.8 <chr [3]>
3 numeric 3 15.8 <chr [3]>
4 POSIXct POSIXt 2 10.5 <chr [2]>
mydata %>% select(-keycolumns, -contains("Date")) %>% describer::describe() %>% knitr::kable(format = "markdown")| .column_name | .column_class | .column_type | .count_elements | .mean_value | .sd_value | .q0_value | .q25_value | .q50_value | .q75_value | .q100_value |
|---|---|---|---|---|---|---|---|---|---|---|
| Sex | character | character | 250 | NA | NA | Female | NA | NA | NA | Male |
| Age | numeric | double | 250 | 50.389558 | 14.0570859 | 25 | 38 | 50 | 63 | 73 |
| Race | character | character | 250 | NA | NA | Asian | NA | NA | NA | White |
| PreinvasiveComponent | character | character | 250 | NA | NA | Absent | NA | NA | NA | Present |
| LVI | character | character | 250 | NA | NA | Absent | NA | NA | NA | Present |
| PNI | character | character | 250 | NA | NA | Absent | NA | NA | NA | Present |
| Death | logical | logical | 250 | NA | NA | FALSE | NA | NA | NA | TRUE |
| Group | character | character | 250 | NA | NA | Control | NA | NA | NA | Treatment |
| Grade | character | character | 250 | NA | NA | 1 | NA | NA | NA | 3 |
| TStage | character | character | 250 | NA | NA | 1 | NA | NA | NA | 4 |
| Anti-X-intensity | numeric | double | 250 | 2.429719 | 0.6382312 | 1 | 2 | 3 | 3 | 3 |
| Anti-Y-intensity | numeric | double | 250 | 2.060241 | 0.7779636 | 1 | 1 | 2 | 3 | 3 |
| LymphNodeMetastasis | character | character | 250 | NA | NA | Absent | NA | NA | NA | Present |
| Valid | logical | logical | 250 | NA | NA | FALSE | NA | NA | NA | TRUE |
| Smoker | logical | logical | 250 | NA | NA | FALSE | NA | NA | NA | TRUE |
| Grade_Level | character | character | 250 | NA | NA | high | NA | NA | NA | moderate |
| DeathTime | character | character | 250 | NA | NA | MoreThan1Year | NA | NA | NA | Within1Year |
Plot variable types
# https://github.com/ropensci/visdat
# http://visdat.njtierney.com/articles/using_visdat.html
# https://cran.r-project.org/web/packages/visdat/index.html
# http://visdat.njtierney.com/
# visdat::vis_guess(mydata)
visdat::vis_dat(mydata)character variablescharacterVariables <- mydata %>% select(-keycolumns) %>% inspectdf::inspect_types() %>%
dplyr::filter(type == "character") %>% dplyr::select(col_name) %>% pull() %>%
unlist()
characterVariables [1] "Sex" "Race" "PreinvasiveComponent"
[4] "LVI" "PNI" "Group"
[7] "Grade" "TStage" "LymphNodeMetastasis"
[10] "Grade_Level" "DeathTime"
categorical variablescategoricalVariables <- mydata %>% dplyr::select(-keycolumns, -contains("Date")) %>%
describer::describe() %>% janitor::clean_names() %>% dplyr::filter(column_type ==
"factor") %>% dplyr::select(column_name) %>% dplyr::pull()
categoricalVariablescharacter(0)
continious variablescontiniousVariables <- mydata %>% dplyr::select(-keycolumns, -contains("Date")) %>%
describer::describe() %>% janitor::clean_names() %>% dplyr::filter(column_type ==
"numeric" | column_type == "double") %>% dplyr::select(column_name) %>% dplyr::pull()
continiousVariables[1] "Age" "Anti-X-intensity" "Anti-Y-intensity"
numeric variablesnumericVariables <- mydata %>% select(-keycolumns) %>% inspectdf::inspect_types() %>%
dplyr::filter(type == "numeric") %>% dplyr::select(col_name) %>% pull() %>% unlist()
numericVariables[1] "Age" "Anti-X-intensity" "Anti-Y-intensity"
integer variablesintegerVariables <- mydata %>% select(-keycolumns) %>% inspectdf::inspect_types() %>%
dplyr::filter(type == "integer") %>% dplyr::select(col_name) %>% pull() %>% unlist()
integerVariablesNULL
Codes for overviewing the data.20
reactable::reactable(data = mydata, sortable = TRUE, resizable = TRUE, filterable = TRUE,
searchable = TRUE, pagination = TRUE, paginationType = "numbers", showPageSizeOptions = TRUE,
highlight = TRUE, striped = TRUE, outlined = TRUE, compact = TRUE, wrap = FALSE,
showSortIcon = TRUE, showSortable = TRUE)Summary of Data via summarytools 📦
if (!dir.exists(here::here("out"))) {
dir.create(here::here("out"))
}
summarytools::view(x = summarytools::dfSummary(mydata %>% select(-keycolumns)), file = here::here("out",
"mydata_summary.html"))Summary via dataMaid 📦
if (!dir.exists(here::here("out"))) {
dir.create(here::here("out"))
}
dataMaid::makeDataReport(data = mydata, file = here::here("out", "dataMaid_mydata.Rmd"),
replace = TRUE, openResult = FALSE, render = FALSE, quiet = TRUE)Summary via explore 📦
if (!dir.exists(here::here("out"))) {
dir.create(here::here("out"))
}
mydata %>% select(-dateVariables) %>% explore::report(output_file = "mydata_report.html",
output_dir = here::here("out"))Glimpse of Data
Observations: 250
Variables: 17
$ Sex <chr> "Male", "Male", "Female", "Male", "Female", "Mal…
$ Age <dbl> 54, 60, 30, 56, 45, 69, 61, 60, 39, 65, 63, 72, …
$ Race <chr> "Hispanic", "White", "White", "White", "White", …
$ PreinvasiveComponent <chr> "Absent", "Present", "Absent", "Present", "Prese…
$ LVI <chr> "Present", "Present", "Absent", "Present", "Abse…
$ PNI <chr> "Absent", "Absent", "Present", "Absent", "Presen…
$ Death <lgl> TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE…
$ Group <chr> "Control", "Control", "Control", "Control", "Con…
$ Grade <chr> "2", "3", "3", "3", "2", "1", "1", "3", "1", "1"…
$ TStage <chr> "3", "3", "2", "1", "4", "4", "4", "3", "2", "1"…
$ `Anti-X-intensity` <dbl> 3, 2, 2, 3, 3, 2, 3, 3, 3, 3, 2, 2, 1, 2, 3, 1, …
$ `Anti-Y-intensity` <dbl> 2, 2, 2, 2, 1, 2, 3, 1, 3, 2, 2, 3, 3, 1, 3, 3, …
$ LymphNodeMetastasis <chr> "Absent", "Absent", "Absent", "Absent", "Absent"…
$ Valid <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, …
$ Smoker <lgl> FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, T…
$ Grade_Level <chr> "moderate", "high", "moderate", "moderate", "low…
$ DeathTime <chr> "Within1Year", "Within1Year", "Within1Year", "Wi…
variable type na na_pct unique min mean max
1 ID chr 0 0.0 250 NA NA NA
2 Name chr 1 0.4 250 NA NA NA
3 Sex chr 1 0.4 3 NA NA NA
4 Age dbl 1 0.4 50 25 50.39 73
5 Race chr 1 0.4 8 NA NA NA
6 PreinvasiveComponent chr 1 0.4 3 NA NA NA
7 LVI chr 1 0.4 3 NA NA NA
8 PNI chr 1 0.4 3 NA NA NA
9 LastFollowUpDate dat 1 0.4 13 NA NA NA
10 Death lgl 1 0.4 3 0 0.71 1
11 Group chr 1 0.4 3 NA NA NA
12 Grade chr 1 0.4 4 NA NA NA
13 TStage chr 1 0.4 5 NA NA NA
14 Anti-X-intensity dbl 1 0.4 4 1 2.43 3
15 Anti-Y-intensity dbl 1 0.4 4 1 2.06 3
16 LymphNodeMetastasis chr 1 0.4 3 NA NA NA
17 Valid lgl 1 0.4 3 0 0.45 1
18 Smoker lgl 1 0.4 3 0 0.50 1
19 Grade_Level chr 1 0.4 4 NA NA NA
20 SurgeryDate dat 1 0.4 221 NA NA NA
21 DeathTime chr 0 0.0 2 NA NA NA
Explore
Control Data if matching expectations
visdat::vis_expect(data = mydata, expectation = ~.x == -1, show_perc = TRUE)
visdat::vis_expect(mydata, ~.x >= 25)See missing values
$variables
Variable q qNA pNA qZero pZero qBlank pBlank qInf pInf
1 Valid 250 1 0.4% 137 54.8% 0 - 0 -
2 Smoker 250 1 0.4% 125 50% 0 - 0 -
3 Death 250 1 0.4% 73 29.2% 0 - 0 -
4 Sex 250 1 0.4% 0 - 0 - 0 -
5 PreinvasiveComponent 250 1 0.4% 0 - 0 - 0 -
6 LVI 250 1 0.4% 0 - 0 - 0 -
7 PNI 250 1 0.4% 0 - 0 - 0 -
8 Group 250 1 0.4% 0 - 0 - 0 -
9 LymphNodeMetastasis 250 1 0.4% 0 - 0 - 0 -
10 Grade 250 1 0.4% 0 - 0 - 0 -
11 Anti-X-intensity 250 1 0.4% 0 - 0 - 0 -
12 Anti-Y-intensity 250 1 0.4% 0 - 0 - 0 -
13 Grade_Level 250 1 0.4% 0 - 0 - 0 -
14 TStage 250 1 0.4% 0 - 0 - 0 -
15 Race 250 1 0.4% 0 - 0 - 0 -
16 LastFollowUpDate 250 1 0.4% 0 - 0 - 0 -
17 Age 250 1 0.4% 0 - 0 - 0 -
18 SurgeryDate 250 1 0.4% 0 - 0 - 0 -
19 Name 250 1 0.4% 0 - 0 - 0 -
20 DeathTime 250 0 - 0 - 0 - 0 -
21 ID 250 0 - 0 - 0 - 0 -
qDistinct type anomalous_percent
1 3 Logical 55.2%
2 3 Logical 50.4%
3 3 Logical 29.6%
4 3 Character 0.4%
5 3 Character 0.4%
6 3 Character 0.4%
7 3 Character 0.4%
8 3 Character 0.4%
9 3 Character 0.4%
10 4 Character 0.4%
11 4 Numeric 0.4%
12 4 Numeric 0.4%
13 4 Character 0.4%
14 5 Character 0.4%
15 8 Character 0.4%
16 13 Timestamp 0.4%
17 50 Numeric 0.4%
18 221 Timestamp 0.4%
19 250 Character 0.4%
20 2 Character -
21 250 Character -
$problem_variables
[1] Variable q qNA pNA
[5] qZero pZero qBlank pBlank
[9] qInf pInf qDistinct type
[13] anomalous_percent problems
<0 rows> (or 0-length row.names)
================================================================================
[1] "Ignoring variable LastFollowUpDate: Unsupported type for visualization."
[1] "Ignoring variable SurgeryDate: Unsupported type for visualization."
Variable p_1 p_10 p_25 p_50 p_75 p_90 p_99
1 Anti-X-intensity 1 2 2 3 3 3 3
2 Anti-Y-intensity 1 1 1 2 3 3 3
3 Age 26 31 38 50 63 70 73
Summary of Data via DataExplorer 📦
# A tibble: 1 x 9
rows columns discrete_columns continuous_colu… all_missing_col…
<int> <int> <int> <int> <int>
1 250 21 18 3 0
# … with 4 more variables: total_missing_values <int>, complete_rows <int>,
# total_observations <int>, memory_usage <dbl>
Drop columns
Write results as described in (Knijn, Simmer, and Nagtegaal 2015)22
Describe the number of patients included in the analysis and reason for dropout
Report patient/disease characteristics (including the biomarker of interest) with the number of missing values
Describe the interaction of the biomarker of interest with established prognostic variables
Include at least 90 % of initial cases included in univariate and multivariate analyses
Report the estimated effect (relative risk/odds ratio, confidence interval, and p value) in univariate analysis
Report the estimated effect (hazard rate/odds ratio, confidence interval, and p value) in multivariate analysis
Report the estimated effects (hazard ratio/odds ratio, confidence interval, and p value) of other prognostic factors included in multivariate analysis
Codes for Descriptive Statistics.23
Report Data properties via report 📦
The data contains 250 observations of the following variables:
- ID: 250 entries: 001, n = 1; 002, n = 1; 003, n = 1 and 247 others
- Name: 249 entries: Aansh, n = 1; Abdurahmon, n = 1; Abrah, n = 1 and 246 others (1 missing)
- Sex: 2 entries: Male, n = 126; Female, n = 123 (1 missing)
- Age: Mean = 50.39, SD = 14.06, range = [25, 73], 1 missing
- Race: 7 entries: White, n = 162; Hispanic, n = 39; Black, n = 26 and 4 others (1 missing)
- PreinvasiveComponent: 2 entries: Absent, n = 186; Present, n = 63 (1 missing)
- LVI: 2 entries: Absent, n = 157; Present, n = 92 (1 missing)
- PNI: 2 entries: Absent, n = 171; Present, n = 78 (1 missing)
- Death: 2 levels: FALSE (n = 73); TRUE (n = 176) and missing (n = 1)
- Group: 2 entries: Control, n = 131; Treatment, n = 118 (1 missing)
- Grade: 3 entries: 3, n = 101; 1, n = 83; 2, n = 65 (1 missing)
- TStage: 4 entries: 4, n = 101; 3, n = 66; 2, n = 50 and 1 other (1 missing)
- Anti-X-intensity: Mean = 2.43, SD = 0.64, range = [1, 3], 1 missing
- Anti-Y-intensity: Mean = 2.06, SD = 0.78, range = [1, 3], 1 missing
- LymphNodeMetastasis: 2 entries: Absent, n = 153; Present, n = 96 (1 missing)
- Valid: 2 levels: FALSE (n = 137); TRUE (n = 112) and missing (n = 1)
- Smoker: 2 levels: FALSE (n = 125); TRUE (n = 124) and missing (n = 1)
- Grade_Level: 3 entries: high, n = 100; low, n = 77; moderate, n = 72 (1 missing)
- DeathTime: 2 entries: Within1Year, n = 149; MoreThan1Year, n = 101
Table 1 via arsenal 📦
# cat(names(mydata), sep = ' + \n')
library(arsenal)
tab1 <- arsenal::tableby(~Sex + Age + Race + PreinvasiveComponent + LVI + PNI + Death +
Group + Grade + TStage + `Anti-X-intensity` + `Anti-Y-intensity` + LymphNodeMetastasis +
Valid + Smoker + Grade_Level, data = mydata)
summary(tab1)| Overall (N=250) | |
|---|---|
| Sex | |
| N-Miss | 1 |
| Female | 123 (49.4%) |
| Male | 126 (50.6%) |
| Age | |
| N-Miss | 1 |
| Mean (SD) | 50.390 (14.057) |
| Range | 25.000 - 73.000 |
| Race | |
| N-Miss | 1 |
| Asian | 15 (6.0%) |
| Bi-Racial | 4 (1.6%) |
| Black | 26 (10.4%) |
| Hispanic | 39 (15.7%) |
| Native | 2 (0.8%) |
| Other | 1 (0.4%) |
| White | 162 (65.1%) |
| PreinvasiveComponent | |
| N-Miss | 1 |
| Absent | 186 (74.7%) |
| Present | 63 (25.3%) |
| LVI | |
| N-Miss | 1 |
| Absent | 157 (63.1%) |
| Present | 92 (36.9%) |
| PNI | |
| N-Miss | 1 |
| Absent | 171 (68.7%) |
| Present | 78 (31.3%) |
| Death | |
| N-Miss | 1 |
| FALSE | 73 (29.3%) |
| TRUE | 176 (70.7%) |
| Group | |
| N-Miss | 1 |
| Control | 131 (52.6%) |
| Treatment | 118 (47.4%) |
| Grade | |
| N-Miss | 1 |
| 1 | 83 (33.3%) |
| 2 | 65 (26.1%) |
| 3 | 101 (40.6%) |
| TStage | |
| N-Miss | 1 |
| 1 | 32 (12.9%) |
| 2 | 50 (20.1%) |
| 3 | 66 (26.5%) |
| 4 | 101 (40.6%) |
| Anti-X-intensity | |
| N-Miss | 1 |
| Mean (SD) | 2.430 (0.638) |
| Range | 1.000 - 3.000 |
| Anti-Y-intensity | |
| N-Miss | 1 |
| Mean (SD) | 2.060 (0.778) |
| Range | 1.000 - 3.000 |
| LymphNodeMetastasis | |
| N-Miss | 1 |
| Absent | 153 (61.4%) |
| Present | 96 (38.6%) |
| Valid | |
| N-Miss | 1 |
| FALSE | 137 (55.0%) |
| TRUE | 112 (45.0%) |
| Smoker | |
| N-Miss | 1 |
| FALSE | 125 (50.2%) |
| TRUE | 124 (49.8%) |
| Grade_Level | |
| N-Miss | 1 |
| high | 100 (40.2%) |
| low | 77 (30.9%) |
| moderate | 72 (28.9%) |
Table 1 via tableone 📦
library(tableone)
mydata %>% select(-keycolumns, -dateVariables) %>% tableone::CreateTableOne(data = .)
Overall
n 250
Sex = Male (%) 126 (50.6)
Age (mean (SD)) 50.39 (14.06)
Race (%)
Asian 15 ( 6.0)
Bi-Racial 4 ( 1.6)
Black 26 (10.4)
Hispanic 39 (15.7)
Native 2 ( 0.8)
Other 1 ( 0.4)
White 162 (65.1)
PreinvasiveComponent = Present (%) 63 (25.3)
LVI = Present (%) 92 (36.9)
PNI = Present (%) 78 (31.3)
Death = TRUE (%) 176 (70.7)
Group = Treatment (%) 118 (47.4)
Grade (%)
1 83 (33.3)
2 65 (26.1)
3 101 (40.6)
TStage (%)
1 32 (12.9)
2 50 (20.1)
3 66 (26.5)
4 101 (40.6)
Anti-X-intensity (mean (SD)) 2.43 (0.64)
Anti-Y-intensity (mean (SD)) 2.06 (0.78)
LymphNodeMetastasis = Present (%) 96 (38.6)
Valid = TRUE (%) 112 (45.0)
Smoker = TRUE (%) 124 (49.8)
Grade_Level (%)
high 100 (40.2)
low 77 (30.9)
moderate 72 (28.9)
DeathTime = Within1Year (%) 149 (59.6)
Descriptive Statistics of Continuous Variables
mydata %>% select(continiousVariables, numericVariables, integerVariables) %>% summarytools::descr(.,
style = "rmarkdown") variable type na na_pct unique min mean max
1 Sex chr 1 0.4 3 NA NA NA
2 PreinvasiveComponent chr 1 0.4 3 NA NA NA
3 LVI chr 1 0.4 3 NA NA NA
4 PNI chr 1 0.4 3 NA NA NA
5 Death lgl 1 0.4 3 0 0.71 1
6 Group chr 1 0.4 3 NA NA NA
7 Grade chr 1 0.4 4 NA NA NA
8 Anti-X-intensity dbl 1 0.4 4 1 2.43 3
9 Anti-Y-intensity dbl 1 0.4 4 1 2.06 3
10 LymphNodeMetastasis chr 1 0.4 3 NA NA NA
11 Valid lgl 1 0.4 3 0 0.45 1
12 Smoker lgl 1 0.4 3 0 0.50 1
13 Grade_Level chr 1 0.4 4 NA NA NA
14 DeathTime chr 0 0.0 2 NA NA NA
variable type na na_pct unique min mean max
1 Name chr 1 0.4 250 NA NA NA
2 Sex chr 1 0.4 3 NA NA NA
3 Age dbl 1 0.4 50 25 50.39 73
4 Race chr 1 0.4 8 NA NA NA
5 PreinvasiveComponent chr 1 0.4 3 NA NA NA
6 LVI chr 1 0.4 3 NA NA NA
7 PNI chr 1 0.4 3 NA NA NA
8 LastFollowUpDate dat 1 0.4 13 NA NA NA
9 Death lgl 1 0.4 3 0 0.71 1
10 Group chr 1 0.4 3 NA NA NA
11 Grade chr 1 0.4 4 NA NA NA
12 TStage chr 1 0.4 5 NA NA NA
13 Anti-X-intensity dbl 1 0.4 4 1 2.43 3
14 Anti-Y-intensity dbl 1 0.4 4 1 2.06 3
15 LymphNodeMetastasis chr 1 0.4 3 NA NA NA
16 Valid lgl 1 0.4 3 0 0.45 1
17 Smoker lgl 1 0.4 3 0 0.50 1
18 Grade_Level chr 1 0.4 4 NA NA NA
19 SurgeryDate dat 1 0.4 221 NA NA NA
variable type na na_pct unique min mean max
1 ID chr 0 0.0 250 NA NA NA
2 Name chr 1 0.4 250 NA NA NA
3 Sex chr 1 0.4 3 NA NA NA
4 Age dbl 1 0.4 50 25 50.39 73
5 Race chr 1 0.4 8 NA NA NA
6 PreinvasiveComponent chr 1 0.4 3 NA NA NA
7 LVI chr 1 0.4 3 NA NA NA
8 PNI chr 1 0.4 3 NA NA NA
9 LastFollowUpDate dat 1 0.4 13 NA NA NA
10 Death lgl 1 0.4 3 0 0.71 1
11 Group chr 1 0.4 3 NA NA NA
12 Grade chr 1 0.4 4 NA NA NA
13 TStage chr 1 0.4 5 NA NA NA
14 Anti-X-intensity dbl 1 0.4 4 1 2.43 3
15 Anti-Y-intensity dbl 1 0.4 4 1 2.06 3
16 LymphNodeMetastasis chr 1 0.4 3 NA NA NA
17 Valid lgl 1 0.4 3 0 0.45 1
18 Smoker lgl 1 0.4 3 0 0.50 1
19 Grade_Level chr 1 0.4 4 NA NA NA
20 SurgeryDate dat 1 0.4 221 NA NA NA
21 DeathTime chr 0 0.0 2 NA NA NA
Use R/gc_desc_cat.R to generate gc_desc_cat.Rmd containing descriptive statistics for categorical variables
mydata %>% janitor::tabyl(Sex) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()| Sex | n | percent | valid_percent |
|---|---|---|---|
| Female | 123 | 49.2% | 49.4% |
| Male | 126 | 50.4% | 50.6% |
| NA | 1 | 0.4% | - |
mydata %>% janitor::tabyl(Race) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()| Race | n | percent | valid_percent |
|---|---|---|---|
| Asian | 15 | 6.0% | 6.0% |
| Bi-Racial | 4 | 1.6% | 1.6% |
| Black | 26 | 10.4% | 10.4% |
| Hispanic | 39 | 15.6% | 15.7% |
| Native | 2 | 0.8% | 0.8% |
| Other | 1 | 0.4% | 0.4% |
| White | 162 | 64.8% | 65.1% |
| NA | 1 | 0.4% | - |
mydata %>% janitor::tabyl(PreinvasiveComponent) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()| PreinvasiveComponent | n | percent | valid_percent |
|---|---|---|---|
| Absent | 186 | 74.4% | 74.7% |
| Present | 63 | 25.2% | 25.3% |
| NA | 1 | 0.4% | - |
mydata %>% janitor::tabyl(LVI) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()| LVI | n | percent | valid_percent |
|---|---|---|---|
| Absent | 157 | 62.8% | 63.1% |
| Present | 92 | 36.8% | 36.9% |
| NA | 1 | 0.4% | - |
mydata %>% janitor::tabyl(PNI) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()| PNI | n | percent | valid_percent |
|---|---|---|---|
| Absent | 171 | 68.4% | 68.7% |
| Present | 78 | 31.2% | 31.3% |
| NA | 1 | 0.4% | - |
mydata %>% janitor::tabyl(Group) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()| Group | n | percent | valid_percent |
|---|---|---|---|
| Control | 131 | 52.4% | 52.6% |
| Treatment | 118 | 47.2% | 47.4% |
| NA | 1 | 0.4% | - |
mydata %>% janitor::tabyl(Grade) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()| Grade | n | percent | valid_percent |
|---|---|---|---|
| 1 | 83 | 33.2% | 33.3% |
| 2 | 65 | 26.0% | 26.1% |
| 3 | 101 | 40.4% | 40.6% |
| NA | 1 | 0.4% | - |
mydata %>% janitor::tabyl(TStage) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()| TStage | n | percent | valid_percent |
|---|---|---|---|
| 1 | 32 | 12.8% | 12.9% |
| 2 | 50 | 20.0% | 20.1% |
| 3 | 66 | 26.4% | 26.5% |
| 4 | 101 | 40.4% | 40.6% |
| NA | 1 | 0.4% | - |
mydata %>% janitor::tabyl(LymphNodeMetastasis) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()| LymphNodeMetastasis | n | percent | valid_percent |
|---|---|---|---|
| Absent | 153 | 61.2% | 61.4% |
| Present | 96 | 38.4% | 38.6% |
| NA | 1 | 0.4% | - |
mydata %>% janitor::tabyl(Grade_Level) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()| Grade_Level | n | percent | valid_percent |
|---|---|---|---|
| high | 100 | 40.0% | 40.2% |
| low | 77 | 30.8% | 30.9% |
| moderate | 72 | 28.8% | 28.9% |
| NA | 1 | 0.4% | - |
mydata %>% janitor::tabyl(DeathTime) %>% janitor::adorn_pct_formatting(rounding = "half up",
digits = 1) %>% knitr::kable()| DeathTime | n | percent |
|---|---|---|
| MoreThan1Year | 101 | 40.4% |
| Within1Year | 149 | 59.6% |
race_stats <- summarytools::freq(mydata$Race)
print(race_stats, report.nas = FALSE, totals = FALSE, display.type = FALSE, Variable.label = "Race Group")variable = PreinvasiveComponent
type = character
na = 1 of 250 (0.4%)
unique = 3
Absent = 186 (74.4%)
Present = 63 (25.2%)
NA = 1 (0.4%)
## Frequency or custom tables for categorical variables
SmartEDA::ExpCTable(mydata, Target = NULL, margin = 1, clim = 10, nlim = 5, round = 2,
bin = NULL, per = T) Variable Valid Frequency Percent CumPercent
1 Sex Female 123 49.2 49.2
2 Sex Male 126 50.4 99.6
3 Sex NA 1 0.4 100.0
4 Sex TOTAL 250 NA NA
5 Race Asian 15 6.0 6.0
6 Race Bi-Racial 4 1.6 7.6
7 Race Black 26 10.4 18.0
8 Race Hispanic 39 15.6 33.6
9 Race NA 1 0.4 34.0
10 Race Native 2 0.8 34.8
11 Race Other 1 0.4 35.2
12 Race White 162 64.8 100.0
13 Race TOTAL 250 NA NA
14 PreinvasiveComponent Absent 186 74.4 74.4
15 PreinvasiveComponent NA 1 0.4 74.8
16 PreinvasiveComponent Present 63 25.2 100.0
17 PreinvasiveComponent TOTAL 250 NA NA
18 LVI Absent 157 62.8 62.8
19 LVI NA 1 0.4 63.2
20 LVI Present 92 36.8 100.0
21 LVI TOTAL 250 NA NA
22 PNI Absent 171 68.4 68.4
23 PNI NA 1 0.4 68.8
24 PNI Present 78 31.2 100.0
25 PNI TOTAL 250 NA NA
26 Group Control 131 52.4 52.4
27 Group NA 1 0.4 52.8
28 Group Treatment 118 47.2 100.0
29 Group TOTAL 250 NA NA
30 Grade 1 83 33.2 33.2
31 Grade 2 65 26.0 59.2
32 Grade 3 101 40.4 99.6
33 Grade NA 1 0.4 100.0
34 Grade TOTAL 250 NA NA
35 TStage 1 32 12.8 12.8
36 TStage 2 50 20.0 32.8
37 TStage 3 66 26.4 59.2
38 TStage 4 101 40.4 99.6
39 TStage NA 1 0.4 100.0
40 TStage TOTAL 250 NA NA
41 LymphNodeMetastasis Absent 153 61.2 61.2
42 LymphNodeMetastasis NA 1 0.4 61.6
43 LymphNodeMetastasis Present 96 38.4 100.0
44 LymphNodeMetastasis TOTAL 250 NA NA
45 Grade_Level high 100 40.0 40.0
46 Grade_Level low 77 30.8 70.8
47 Grade_Level moderate 72 28.8 99.6
48 Grade_Level NA 1 0.4 100.0
49 Grade_Level TOTAL 250 NA NA
50 DeathTime MoreThan1Year 101 40.4 40.4
51 DeathTime Within1Year 149 59.6 100.0
52 DeathTime TOTAL 250 NA NA
53 Anti-X-intensity 1 20 8.0 8.0
54 Anti-X-intensity 2 102 40.8 48.8
55 Anti-X-intensity 3 127 50.8 99.6
56 Anti-X-intensity NA 1 0.4 100.0
57 Anti-X-intensity TOTAL 250 NA NA
58 Anti-Y-intensity 1 68 27.2 27.2
59 Anti-Y-intensity 2 98 39.2 66.4
60 Anti-Y-intensity 3 83 33.2 99.6
61 Anti-Y-intensity NA 1 0.4 100.0
62 Anti-Y-intensity TOTAL 250 NA NA
# A tibble: 16 x 5
col_name cnt common common_pcnt levels
<chr> <int> <chr> <dbl> <named list>
1 Death 3 TRUE 70.4 <tibble [3 × 3]>
2 DeathTime 2 Within1Year 59.6 <tibble [2 × 3]>
3 Grade 4 3 40.4 <tibble [4 × 3]>
4 Grade_Level 4 high 40 <tibble [4 × 3]>
5 Group 3 Control 52.4 <tibble [3 × 3]>
6 ID 250 001 0.4 <tibble [250 × 3]>
7 LVI 3 Absent 62.8 <tibble [3 × 3]>
8 LymphNodeMetastasis 3 Absent 61.2 <tibble [3 × 3]>
9 Name 250 Aansh 0.4 <tibble [250 × 3]>
10 PNI 3 Absent 68.4 <tibble [3 × 3]>
11 PreinvasiveComponent 3 Absent 74.4 <tibble [3 × 3]>
12 Race 8 White 64.8 <tibble [8 × 3]>
13 Sex 3 Male 50.4 <tibble [3 × 3]>
14 Smoker 3 FALSE 50 <tibble [3 × 3]>
15 TStage 5 4 40.4 <tibble [5 × 3]>
16 Valid 3 FALSE 54.8 <tibble [3 × 3]>
# A tibble: 3 x 3
value prop cnt
<chr> <dbl> <int>
1 Control 0.524 131
2 Treatment 0.472 118
3 <NA> 0.004 1
summarytools::stby(list(x = mydata$LVI, y = mydata$LymphNodeMetastasis), mydata$PNI,
summarytools::ctable)SmartEDA::ExpCTable(mydata, Target = "Sex", margin = 1, clim = 10, nlim = NULL, round = 2,
bin = 4, per = F) VARIABLE CATEGORY Sex:Female Sex:Male Sex:NA TOTAL
1 Race Asian 10 5 0 15
2 Race Bi-Racial 2 2 0 4
3 Race Black 11 15 0 26
4 Race Hispanic 22 17 0 39
5 Race NA 0 1 0 1
6 Race Native 1 1 0 2
7 Race Other 1 0 0 1
8 Race White 76 85 1 162
9 Race TOTAL 123 126 1 250
10 PreinvasiveComponent Absent 91 94 1 186
11 PreinvasiveComponent NA 1 0 0 1
12 PreinvasiveComponent Present 31 32 0 63
13 PreinvasiveComponent TOTAL 123 126 1 250
14 LVI Absent 84 72 1 157
15 LVI NA 0 1 0 1
16 LVI Present 39 53 0 92
17 LVI TOTAL 123 126 1 250
18 PNI Absent 78 93 0 171
19 PNI NA 0 0 1 1
20 PNI Present 45 33 0 78
21 PNI TOTAL 123 126 1 250
22 Group Control 71 60 0 131
23 Group NA 1 0 0 1
24 Group Treatment 51 66 1 118
25 Group TOTAL 123 126 1 250
26 Grade 1 36 47 0 83
27 Grade 2 37 27 1 65
28 Grade 3 50 51 0 101
29 Grade NA 0 1 0 1
30 Grade TOTAL 123 126 1 250
31 TStage 1 14 18 0 32
32 TStage 2 25 25 0 50
33 TStage 3 28 38 0 66
34 TStage 4 56 44 1 101
35 TStage NA 0 1 0 1
36 TStage TOTAL 123 126 1 250
37 LymphNodeMetastasis Absent 76 76 1 153
38 LymphNodeMetastasis NA 1 0 0 1
39 LymphNodeMetastasis Present 46 50 0 96
40 LymphNodeMetastasis TOTAL 123 126 1 250
41 Grade_Level high 51 48 1 100
42 Grade_Level low 35 42 0 77
43 Grade_Level moderate 36 36 0 72
44 Grade_Level NA 1 0 0 1
45 Grade_Level TOTAL 123 126 1 250
46 DeathTime MoreThan1Year 44 57 0 101
47 DeathTime Within1Year 79 69 1 149
48 DeathTime TOTAL 123 126 1 250
49 Anti-X-intensity 1 7 13 0 20
50 Anti-X-intensity 2 54 47 1 102
51 Anti-X-intensity 3 61 66 0 127
52 Anti-X-intensity NA 1 0 0 1
53 Anti-X-intensity TOTAL 123 126 1 250
54 Anti-Y-intensity 1 36 32 0 68
55 Anti-Y-intensity 2 50 48 0 98
56 Anti-Y-intensity 3 36 46 1 83
57 Anti-Y-intensity NA 1 0 0 1
58 Anti-Y-intensity TOTAL 123 126 1 250
mydata %>% select(characterVariables) %>% select(PreinvasiveComponent, PNI, LVI) %>%
reactable::reactable(data = ., groupBy = c("PreinvasiveComponent", "PNI"), columns = list(LVI = reactable::colDef(aggregate = "count")))Descriptive Statistics Age
mydata %>% jmv::descriptives(data = ., vars = "Age", hist = TRUE, dens = TRUE, box = TRUE,
violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE, skew = TRUE,
kurt = TRUE, quart = TRUE)
DESCRIPTIVES
Descriptives
──────────────────────────────────
Age
──────────────────────────────────
N 249
Missing 1
Mean 50.4
Median 50.0
Mode 72.0
Standard deviation 14.1
Variance 198
Minimum 25.0
Maximum 73.0
Skewness -0.0364
Std. error skewness 0.154
Kurtosis -1.17
Std. error kurtosis 0.307
25th percentile 38.0
50th percentile 50.0
75th percentile 63.0
──────────────────────────────────
Descriptive Statistics Anti-X-intensity
mydata %>% jmv::descriptives(data = ., vars = "Anti-X-intensity", hist = TRUE, dens = TRUE,
box = TRUE, violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE,
skew = TRUE, kurt = TRUE, quart = TRUE)
DESCRIPTIVES
Descriptives
───────────────────────────────────────────
Anti-X-intensity
───────────────────────────────────────────
N 249
Missing 1
Mean 2.43
Median 3.00
Mode 3.00
Standard deviation 0.638
Variance 0.407
Minimum 1.00
Maximum 3.00
Skewness -0.672
Std. error skewness 0.154
Kurtosis -0.535
Std. error kurtosis 0.307
25th percentile 2.00
50th percentile 3.00
75th percentile 3.00
───────────────────────────────────────────
Descriptive Statistics Anti-Y-intensity
mydata %>% jmv::descriptives(data = ., vars = "Anti-Y-intensity", hist = TRUE, dens = TRUE,
box = TRUE, violin = TRUE, dot = TRUE, mode = TRUE, sd = TRUE, variance = TRUE,
skew = TRUE, kurt = TRUE, quart = TRUE)
DESCRIPTIVES
Descriptives
───────────────────────────────────────────
Anti-Y-intensity
───────────────────────────────────────────
N 249
Missing 1
Mean 2.06
Median 2.00
Mode 2.00
Standard deviation 0.778
Variance 0.605
Minimum 1.00
Maximum 3.00
Skewness -0.105
Std. error skewness 0.154
Kurtosis -1.34
Std. error kurtosis 0.307
25th percentile 1.00
50th percentile 2.00
75th percentile 3.00
───────────────────────────────────────────
Overall
n 250
Age (mean (SD)) 50.39 (14.06)
Anti-X-intensity (mean (SD)) 2.43 (0.64)
Anti-Y-intensity (mean (SD)) 2.06 (0.78)
Overall
n 250
Age (mean (SD)) 50.39 (14.06)
Anti-X-intensity (median [IQR]) 3.00 [2.00, 3.00]
Anti-Y-intensity (mean (SD)) 2.06 (0.78)
variable = Age
type = double
na = 1 of 250 (0.4%)
unique = 50
min|max = 25 | 73
q05|q95 = 28 | 72
q25|q75 = 38 | 63
median = 50
mean = 50.38956
mydata %>% select(continiousVariables) %>% SmartEDA::ExpNumStat(data = ., by = "A",
gp = NULL, Qnt = seq(0, 1, 0.1), MesofShape = 2, Outlier = TRUE, round = 2)# A tibble: 3 x 10
col_name min q1 median mean q3 max sd pcnt_na hist
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <named list>
1 Age 25 38 50 50.4 63 73 14.1 0.4 <tibble [12…
2 Anti-X-inten… 1 2 3 2.43 3 3 0.638 0.4 <tibble [12…
3 Anti-Y-inten… 1 1 2 2.06 3 3 0.778 0.4 <tibble [12…
# A tibble: 27 x 2
value prop
<chr> <dbl>
1 [-Inf, 24) 0
2 [24, 26) 0.00803
3 [26, 28) 0.0402
4 [28, 30) 0.0201
5 [30, 32) 0.0442
6 [32, 34) 0.0321
7 [34, 36) 0.0442
8 [36, 38) 0.0402
9 [38, 40) 0.0482
10 [40, 42) 0.0361
# … with 17 more rows
summarytools::stby(data = mydata, INDICES = mydata$PreinvasiveComponent, FUN = summarytools::descr,
stats = c("mean", "sd", "min", "med", "max"), transpose = TRUE)with(mydata, summarytools::stby(Age, PreinvasiveComponent, summarytools::descr),
stats = c("mean", "sd", "min", "med", "max"), transpose = TRUE)## Summary statistics by – category
SmartEDA::ExpNumStat(mydata, by = "GA", gp = "PreinvasiveComponent", Qnt = seq(0,
1, 0.1), MesofShape = 2, Outlier = TRUE, round = 2) Vname Group TN nNeg nZero nPos NegInf PosInf NA_Value
1 Age PreinvasiveComponent:All 250 0 0 249 0 0 1
2 Age PreinvasiveComponent:Absent 186 0 0 185 0 0 1
3 Age PreinvasiveComponent:Present 63 0 0 63 0 0 0
4 Age PreinvasiveComponent:NA 0 0 0 0 0 0 0
Per_of_Missing sum min max mean median SD CV IQR Skewness Kurtosis
1 0.40 12547 25 73 50.39 50 14.06 0.28 25.0 -0.04 -1.17
2 0.54 9357 25 73 50.58 50 14.16 0.28 24.0 -0.05 -1.18
3 0.00 3150 25 73 50.00 50 13.89 0.28 22.5 -0.04 -1.16
4 NaN 0 Inf -Inf NaN NA NA NA NA NaN NaN
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% LB.25% UB.75% nOutliers
1 25 31.0 36 40.4 46.2 50 54.8 60.0 65.0 70 73 0.50 100.50 0
2 25 31.0 36 42.0 46.6 50 55.0 60.8 65.0 70 73 3.00 99.00 0
3 25 31.4 36 39.2 46.8 50 54.2 60.0 63.8 69 73 4.25 94.25 0
4 NA NA NA NA NA NA NA NA NA NA NA NA NA 0
Codes for Survival Analysis24
https://link.springer.com/article/10.1007/s00701-019-04096-9
Calculate survival time
mydata$int <- lubridate::interval(lubridate::ymd(mydata$SurgeryDate), lubridate::ymd(mydata$LastFollowUpDate))
mydata$OverallTime <- lubridate::time_length(mydata$int, "month")
mydata$OverallTime <- round(mydata$OverallTime, digits = 1)recode death status outcome as numbers for survival analysis
## Recoding mydata$Death into mydata$Outcome
mydata$Outcome <- forcats::fct_recode(as.character(mydata$Death), `1` = "TRUE", `0` = "FALSE")
mydata$Outcome <- as.numeric(as.character(mydata$Outcome))it is always a good practice to double-check after recoding25
0 1
FALSE 73 0
TRUE 0 176
library(survival)
# data(lung) km <- with(lung, Surv(time, status))
km <- with(mydata, Surv(OverallTime, Outcome))
head(km, 80) [1] 10.0 9.7 3.3 11.0 10.0+ 3.7 5.4 11.2+ 10.1 7.6+ 10.2+ 4.0
[13] 10.7 9.8 8.3 8.8 4.9 7.3? 6.6 4.7 10.5+ 6.6 6.3 10.7
[25] 5.2+ 10.9 8.5 9.8 7.1 7.8 9.5 11.6+ 7.8 5.3+ 4.5+ 4.7
[37] 6.7 9.5+ 5.4 10.4 6.3 6.3 5.4 5.0+ 3.3 3.5 11.8+ 9.8
[49] 5.9 5.9 10.0 5.3 6.0 9.7 8.0+ 8.9 3.2 7.1 7.3+ 5.3
[61] 3.8 5.7 5.9+ 3.3 11.8 6.5 3.4+ 11.2 8.9 11.1+ 6.4+ 9.2+
[73] 7.0 8.9 6.2 7.9 9.0+ 5.6+ 7.6+ 11.0
Kaplan-Meier Plot Log-Rank Test
# Drawing Survival Curves Using ggplot2
# https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html
dependentKM <- "Surv(OverallTime, Outcome)"
explanatoryKM <- "LVI"
mydata %>%
finalfit::surv_plot(.data = .,
dependent = dependentKM,
explanatory = explanatoryKM,
xlab='Time (months)',
pval=TRUE,
legend = 'none',
break.time.by = 12,
xlim = c(0,60)
# legend.labs = c('a','b')
)# Drawing Survival Curves Using ggplot2
# https://rpkgs.datanovia.com/survminer/reference/ggsurvplot.html
mydata %>%
finalfit::surv_plot(.data = .,
dependent = "Surv(OverallTime, Outcome)",
explanatory = "LVI",
xlab='Time (months)',
pval=TRUE,
legend = 'none',
break.time.by = 12,
xlim = c(0,60)
# legend.labs = c('a','b')
)library(finalfit)
library(survival)
explanatoryUni <- "LVI"
dependentUni <- "Surv(OverallTime, Outcome)"
tUni <- mydata %>% finalfit::finalfit(dependentUni, explanatoryUni)
knitr::kable(tUni, row.names = FALSE, align = c("l", "l", "r", "r", "r", "r"))| Dependent: Surv(OverallTime, Outcome) | all | HR (univariable) | HR (multivariable) | |
|---|---|---|---|---|
| LVI | Absent | 157 (100.0) | NA | NA |
| Present | 92 (100.0) | 2.02 (1.47-2.78, p<0.001) | 2.02 (1.47-2.78, p<0.001) |
tUni_df <- tibble::as_tibble(tUni, .name_repair = "minimal") %>% janitor::clean_names()
tUni_df_descr <- paste0("When ", tUni_df$dependent_surv_overall_time_outcome[1],
" is ", tUni_df$x[2], ", there is ", tUni_df$hr_univariable[2], " times risk than ",
"when ", tUni_df$dependent_surv_overall_time_outcome[1], " is ", tUni_df$x[1],
".")When LVI is Present, there is 2.02 (1.47-2.78, p<0.001) times risk than when LVI is Absent.
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)
3 observations deleted due to missingness
n events median 0.95LCL 0.95UCL
LVI=Absent 157 111 22.6 15.3 29.4
LVI=Present 90 64 9.8 8.7 13.3
km_fit_median_df <- summary(km_fit)
km_fit_median_df <- as.data.frame(km_fit_median_df$table) %>% janitor::clean_names() %>%
tibble::rownames_to_column()km_fit_median_definition <- km_fit_median_df %>% dplyr::mutate(description = glue::glue("When {rowname}, median survival is {median} [{x0_95lcl} - {x0_95ucl}, 95% CI] months.")) %>%
dplyr::select(description) %>% pull()When LVI=Absent, median survival is 22.6 [15.3 - 29.4, 95% CI] months., When LVI=Present, median survival is 9.8 [8.7 - 13.3, 95% CI] months.
Call: survfit(formula = Surv(OverallTime, Outcome) ~ LVI, data = mydata)
3 observations deleted due to missingness
LVI=Absent
time n.risk n.event survival std.err lower 95% CI upper 95% CI
12 80 54 0.623 0.0408 0.548 0.708
36 21 42 0.241 0.0412 0.172 0.337
LVI=Present
time n.risk n.event survival std.err lower 95% CI upper 95% CI
12 17 47 0.3630 0.0597 0.2629 0.501
36 2 15 0.0427 0.0292 0.0112 0.163
km_fit_summary <- summary(km_fit, times = c(12, 36, 60))
km_fit_df <- as.data.frame(km_fit_summary[c("strata", "time", "n.risk", "n.event",
"surv", "std.err", "lower", "upper")])km_fit_definition <- km_fit_df %>% dplyr::mutate(description = glue::glue("When {strata}, {time} month survival is {scales::percent(surv)} [{scales::percent(lower)}-{scales::percent(upper)}, 95% CI].")) %>%
dplyr::select(description) %>% pull()When LVI=Absent, 12 month survival is 62% [54.8%-71%, 95% CI]., When LVI=Absent, 36 month survival is 24% [17.2%-34%, 95% CI]., When LVI=Present, 12 month survival is 36% [26.3%-50%, 95% CI]., When LVI=Present, 36 month survival is 4% [1.1%-16%, 95% CI].
dependentKM <- "Surv(OverallTime, Outcome)"
explanatoryKM <- "TStage"
mydata %>%
finalfit::surv_plot(.data = .,
dependent = dependentKM,
explanatory = explanatoryKM,
xlab='Time (months)',
pval=TRUE,
legend = 'none',
break.time.by = 12,
xlim = c(0,60)
# legend.labs = c('a','b')
)Interpret the results in context of the working hypothesis elaborated in the introduction and other relevant studies; include a discussion of limitations of the study.
Discuss potential clinical applications and implications for future research
Knijn, N., F. Simmer, and I. D. Nagtegaal. 2015. “Recommendations for Reporting Histopathology Studies: A Proposal.” Virchows Archiv 466 (6): 611–15. https://doi.org/10.1007/s00428-015-1762-3.
Schmidt, Robert L., Deborah J. Chute, Jorie M. Colbert-Getz, Adolfo Firpo-Betancourt, Daniel S. James, Julie K. Karp, Douglas C. Miller, et al. 2017. “Statistical Literacy Among Academic Pathologists: A Survey Study to Gauge Knowledge of Frequently Used Statistical Tests Among Trainees and Faculty.” Archives of Pathology & Laboratory Medicine 141 (2): 279–87. https://doi.org/10.5858/arpa.2016-0200-OA.
From Table 1: Proposed items for reporting histopathology studies. Recommendations for reporting histopathology studies: a proposal Virchows Arch (2015) 466:611–615 DOI 10.1007/s00428-015-1762-3↩︎
From Table 1: Proposed items for reporting histopathology studies. Recommendations for reporting histopathology studies: a proposal Virchows Arch (2015) 466:611–615 DOI 10.1007/s00428-015-1762-3↩︎
See childRmd/_01header.Rmd file for other general settings↩︎
Change echo = FALSE to hide codes after knitting.↩︎
See childRmd/_02fakeData.Rmd file for other codes↩︎
Synthea The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures. BMC Med Inform Decis Mak 19, 44 (2019) doi:10.1186/s12911-019-0793-0↩︎
https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-019-0793-0↩︎
https://medium.com/free-code-camp/how-our-test-data-generator-makes-fake-data-look-real-ace01c5bde4a↩︎
lung, cancer, breast datası ile birleştir↩︎
See childRmd/_03importData.Rmd file for other codes↩︎
See childRmd/_04briefSummary.Rmd file for other codes↩︎
Kişisel verilerin kaydedilmesi ve kişisel verileri hukuka aykırı olarak verme veya ele geçirme Türk Ceza Kanunu’nun 135. ve 136. maddesi kapsamında bizim hukuk sistemimizde suç olarak tanımlanmıştır. Kişisel verilerin kaydedilmesi suçunun cezası 1 ila 3 yıl hapis cezasıdır. Suçun nitelikli hali ise, kamu görevlisi tarafından görevin verdiği yetkinin kötüye kullanılarak veya belirli bir meslek veya sanatın sağladığı kolaylıktan yararlanılarak işlenmesidir ki bu durumda suçun cezası 1.5 ile 4.5 yıl hapis cezası olacaktır.↩︎
See childRmd/_06variableTypes.Rmd file for other codes↩︎
See childRmd/_07overView.Rmd file for other codes↩︎
Statistical Literacy Among Academic Pathologists: A Survey Study to Gauge Knowledge of Frequently Used Statistical Tests Among Trainees and Faculty. Archives of Pathology & Laboratory Medicine: February 2017, Vol. 141, No. 2, pp. 279-287. https://doi.org/10.5858/arpa.2016-0200-OA↩︎
From Table 1: Proposed items for reporting histopathology studies. Recommendations for reporting histopathology studies: a proposal Virchows Arch (2015) 466:611–615 DOI 10.1007/s00428-015-1762-3↩︎
See childRmd/_11descriptives.Rmd file for other codes↩︎
See childRmd/_18survival.Rmd file for other codes, and childRmd/_19shinySurvival.Rmd for shiny application↩︎
JAMA retraction after miscoding – new Finalfit function to check recoding↩︎
See childRmd/_23footer.Rmd file for other codes↩︎
Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. (2016) Software Citation Principles. PeerJ Computer Science 2:e86. DOI: 10.7717/peerj-cs.86 https://www.force11.org/software-citation-principles↩︎
A work by Serdar Balci
drserdarbalci@gmail.com